Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach

نویسندگان

  • Yangqiu Song
  • Shusen Wang
  • Haixun Wang
چکیده

Concepts embody the knowledge to facilitate our cognitive processes of learning. Mapping short texts to a large set of open domain concepts has gained many successful applications. In this paper, we unify the existing conceptualization methods from a Bayesian perspective, and discuss the three modeling approaches: descriptive, generative, and discriminative models. Motivated by the discussion of their advantages and shortcomings, we develop a generative + descriptive modeling approach. Our model considers term relatedness in the context, and will result in disambiguated conceptualization. We show the results of short text clustering using a news title data set and a Twitter message data set, and demonstrate the effectiveness of the developed approach compared with the state-of-the-art conceptualization and topic modeling approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short Text Conceptualization Using a Probabilistic Knowledgebase

Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat text as bags of words. Semantics in the text is largely ignored in the mining process, and mining results often have low interpretability. One particular challenge faced by such approaches lies in short text understanding, as short texts lack enough content from which statistical conclu...

متن کامل

The Domain of the semantics of ‘promise’ in the Holy Quran

Semantics is a part of linguistic by which it can be analyzed the meaning of the words and sentences of a text and identified the part of speech with regard to semantics. This is a descriptive-analytic research and it deals with studying the meaning of ‘promise’ in the Holy Quran based on principles of semantics with a collocation approach by library methodology. Also, by virtue of ...

متن کامل

Evaluating Generative Models for Text Generation

Generating human quality text is a challenging problem because of ambiguity of meaning and difficulty in modeling long term semantic connections. Recurrent Neural Networks (RNNs) have shown promising results in this problem domain, with the most common approach to its training being to maximize the log predictive likelihood of each true token in the training sequence given the previously observ...

متن کامل

Conceptualization of business excellence model with a grand theory approach

This study aims to conceptualize business excellence model and identify its variables and indicators. The philosophical foundations of the pragmatic, humanistic theory of symbolic interactionism has been, quietly and the strategy of grand theory deal with open, axial and selective coding; whose output is a new concept. Data collection is based on documentation studies on excellence models. The ...

متن کامل

مدل‌سازی شبکه‌های چندترمیناله در سیستم‌های قدرت با استفاده از روش برازش برداری به‌منظور تحلیل حالات گذرای الکترومغناطیسی فرکانس بالا

Modeling of frequency-dependent components of power system is often based on a terminal description by an admittance matrix in the frequency domain. One challenge in the extraction of state-space models from such data is to prevent possible error magnification when the model is to be applied in time-domain simulations. The error magnification is a consequence of inaccurate representation of sma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015